28 research outputs found

    To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction

    Full text link
    Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel object and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an end-to-end approach that directly predicts stability and related quantities from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way bypassing the need for an explicit simulation. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities related to the potential fall of the towers. The evaluation is carried out on synthetic data and compared to human judgments on the same stimuli

    Airborne-Shadow: Towards Fine-Grained Shadow Detection in Aerial Imagery

    Get PDF
    Shadow detection is the first step in the process of shadow removal, which improves the understanding of complex urban scenes in aerial imagery for applications such as autonomous driving, infrastructure monitoring, and mapping. However, the limited annotation in existing datasets hinders the effectiveness of semantic segmentation and the ability of shadow removal algorithms to meet the fine-grained requirements of real-world applications. To address this problem, we present Airborne-Shadow (ASD), a meticulously annotated dataset for shadow detection in aerial imagery. Unlike existing datasets, ASD includes annotations for both heavy and light shadows, covering various structures ranging from buildings and bridges to smaller details such as poles and fences. Therefore, we define shadow detection tasks for multi-class, single class, and merging two classes. Extensive experiments show the challenges that state-of-the-art semantic segmentation and shadow detection algorithms face in handling different shadow sizes, scales, and fine details, while still achieving comparable results to conventional methods. We make the ASD dataset publicly available to encourage progress in shadow detection

    Road condition assessment from aerial imagery using deep learning

    Get PDF
    Terrestrial sensors are commonly used to inspect and document the condition of roads at regular intervals and according to defined rules. For example in Germany, extensive data and information is obtained, which is stored in the Federal Road Information System and made available in particular for deriving necessary decisions. Transverse and longitudinal evenness, for example, are recorded by vehicles using laser techniques. To detect damage to the road surface, images are captured and recorded using area or line scan cameras. All these methods provide very accurate information about the condition of the road, but are time-consuming and costly. Aerial imagery (e.g. multi- or hyperspectral, SAR) provide an additional possibility for the acquisition of the specific parameters describing the condition of roads, yet a direct transfer from objects extractable from aerial imagery to the required objects or parameters, which determine the condition of the road is difficult and in some cases impossible. In this work, we investigate the transferability of objects commonly used for the terrestrial-based assessment of road surfaces to an aerial image-based assessment. In addition, we generated a suitable dataset and developed a deep learning based image segmentation method capable of extracting two relevant road condition parameters from high-resolution multispectral aerial imagery, namely cracks and working seams. The obtained results show that our models are able to extraction these thin features from aerial images, indicating the possibility of using more automated approaches for road surface condition assessment in the future

    Segment-and-count: Vehicle Counting in Aerial Imagery using Atrous Convolutional Neural Networks

    Get PDF
    High-resolution aerial imagery can provide detailed and in some cases even real-time information about traffic related objects. Vehicle localization and counting using aerial imagery play an important role in a broad range of applications. Recently, convolutional neural networks (CNNs) with atrous convolution layers have shown better performance for semantic segmentation compared to conventional convolutional aproaches. In this work, we propose a joint vehicle segmentation and counting method based on atrous convolutional layers. This method uses a multi-task loss function to simultaneously reduce pixel-wise segmentation and vehicle counting errors. In addition, the rectangular shapes of vehicle segmentations are refined using morphological operations. In order to evaluate the proposed methodology, we apply it to the public "DLR 3K" benchmark dataset which contains aerial images with a ground sampling distance of 13 cm. Results show that our proposed method reaches 81.58% mean intersection over union in vehicle segmentation and shows an accuracy of 91.12% in vehicle counting, outperforming the baselines

    Deep-Learning segmentation and 3D reconstruction of road markings using multi-view aerial imagery

    Get PDF
    The 3D information of road infrastructures are gaining importance with the development of autonomous driving. In this context, the exact 2D position of the road markings as well as the height information play an important role in e.g. lane-accurate self-localization of autonomous vehicles. In this paper, the overall task is divided into an automatic segmentation followed by a refined 3D reconstruction. For the segmentation task, we apply a wavelet-enhanced fully convolutional network on multi-view high-resolution aerial imagery. Based on the resulting 2D segments in the original images, we propose a successive workflow for the 3D reconstruction of road markings based on a least-squares line-fitting in multi-view imagery. The 3D reconstruction exploits the line character of road markings with the aim to optimize the best 3D line location by minimizing the distance from its back projection to the detected 2D line in all the covering images. Results show an improved IoU of the automatic road marking segmentation by exploiting the multi-view character of the aerial images and a more accurate 3D reconstruction of the road surface compared to the Semi Global Matching (SGM) algorithm. Further, the approach avoids the matching problem in non-textured image parts and is not limited to lines of finite length. In this paper, the approach is presented and validated on several aerial image data sets covering different scenarios like motorways and urban regions

    Vehicle Occlusion Removal from Single Aerial Images Using Generative Adversarial Networks

    Get PDF
    Removing occluding objects such as vehicles from drivable areas allows precise extraction of road boundaries and related semantic objects such as lane-markings, which is crucial for several applications such as generating high-definition maps for autonomous driving. Conventionally, multiple images of the same area taken at different times or from various perspectives are used to remove occlusions and to reconstruct the occluded areas. Nevertheless, these approaches require large amounts of data, which are not always available. Furthermore, they do not work for static occlusions caused by, among others, parked vehicles. In this paper, we address occlusion removal based on single aerial images using generative adversarial networks (GANs), which are able to deal with the mentioned challenges. To this end, we adapt several state-of-the-art GAN-based image inpainting algorithms to reconstruct the missing information. Results indicate that the StructureFlow algorithm outperforms the competitors and the restorations obtained are robust, with high visual fidelity in real-world applications. Furthermore, due to the lack of annotated aerial vehicle removal datasets, we generate a new dataset for training and validating the algorithms, the Aerial Vehicle Occlusion Removal (AVOR) dataset. To the best of our knowledge, our work is the first to address vehicle removal using deep learning algorithms to enhance maps

    Ad-hoc situational awareness during floods using remote sensing data and machine learning methods

    Get PDF
    Recent advances in machine learning and the rise of new large-scale remote sensing datasets have opened new possibilities for automation of remote sensing data analysis that make it possible to cope with the growing data volume and complexity and the inherent spatio-temporal dynamics of disaster situations. In this work, we provide insights into machine learning methods developed by the German Aerospace Center (DLR) for rapid mapping activities and used to support disaster response efforts during the 2021 flood in Western Germany. These include specifically methods related to systematic flood monitoring from Sentinel-1 as well as road-network extraction, object detection and damage assessment from very high-resolution optical satellite and aerial images. We discuss aspects of data acquisition and present results that were used by first responders during the flood disaster

    Infrastructure and Traffic Monitoring in Aerial Imagery Using Deep Learning Methods

    Get PDF
    Infrastructure and traffic monitoring are two of the most innovative applications for automatically extracting semantic information from aerial images. These applications also include urban and city planning, High-Definition (HD) mapping, parking lot usage mapping, and disaster management mapping for search and rescue operations, among others. HD mapping is also used in autonomous driving as an additional source of information, as it provides fine-grained information about the location of objects. The best way to publicly disseminate spatial details about infrastructure components such as buildings, roads, parking lots, lane markings, and vegetation is through maps. The necessary data collection in the field (on the ground) for a larger area is costly because terrestrial imagery requires the cartographer to visit the area in question. On the other hand, aerial photography offers a wealth of opportunities to remotely observe and map a large area in a short time. With appropriate camera configuration and flight altitude, the resolution of aerial imagery is a few centimeters. Infrastructure and traffic monitoring is an application of aerial image analysis that has emerged in recent decades. For example, aerial imagery can monitor traffic flow to quickly detect potential bottlenecks, accidents, congestion, and other features of interest in a large area. Additionally, automatic detection of dynamic objects can help build more efficient roads, intersections, and highways to reduce congestion and eliminate hazardous areas. The application is not limited to land transportation but can also be extended to maritime transportation. Other dynamic objects in mobility applications include bicyclists, motorcyclists, and pedestrians. A dynamic map and a static map can be combined using automatic aerial imagery analysis, resulting in a comprehensive map called a hybrid map. Initially, image analysis algorithms relied mainly on feature-driven methods. This work focuses on data-driven algorithms such as deep-learning methods that extract information with high accuracy while being transferable to other regions of interest. Objects with few pixels, complex backgrounds, different scales, low resolution, different view angles, shadows and occlusions make this task very challenging. This work aims to develop new deep learning methods to automatically extract infrastructure and traffic monitoring information from aerial images. A total of five problems are addressed in this context. Two problems are related to automatic segmentation from aerial imagery, e.g., roadway markings and other infrastructure-related objects for generating fine-grained HD maps. The other three problems are related to detecting and tracking vehicles in aerial imagery. The present work is cumulative in nature. All five problems are described in six peer-reviewed articles summarized below. Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks (CNNs): Conventional maps describe infrastructure mainly from the perspective of the road. In order to achieve comprehensive monitoring of the infrastructure, a detailed map is required, including, for example, detailed information about lane markings. These represent an essential and inseparable component of the road infrastructure. By automatically locating lane markings, it is possible to define road boundaries, analyze traffic behavior, and create HD maps for autonomous vehicles. A proposed method combines the wavelet transform (WT) with CNN and enables direct extraction of lane markings with high accuracy and precision at high computational speed without additional information or intermediate processing. SkyScapes - Fine-Grained Semantic Understanding of Aerial Scenes: A detailed map contains information about the different categories of infrastructure components such as buildings, sidewalks, and road markings. A new approach based on deep learning methods is presented that automatically extracts all relevant objects with pixel-level accuracy without any additional information. The algorithm is a cross-class and cross-task dense pixel-wise semantic segmentation for dense and angular segments. This work also presents a concept for the direct classification of road markings in multiple classes from aerial imagery, which is also applicable to satellite imagery. In addition, proof of concept is also provided to extract entrances, exits, and hazardous areas. The proposed method outperforms many state-of-the-art algorithms at this time. Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery: EAGLE: Large-scale Vehicle Detection Dataset in Real-World Scenarios using Aerial Imagery: In these two papers, a high-precision vehicle detector based on aerial imagery data is presented in conjunction with a robust dataset for detecting vehicles under realistic conditions. Vehicles are annotated in different classes with driving directions and are also detected automatically using the algorithm. The proposed method is a novel deep learning architecture that can localize objects with horizontal and rotated bounding boxes to determine the exact position of objects of interest. The algorithm can also be applied to other objects such as boats and ships for maritime applications. In addition to the traffic monitoring problem, the localization of individual infrastructure objects such as bridges, ports, traffic circles, tank farms, and several other classes is also demonstrated. ShuffleDet: Real-Time Vehicle Detection Network in Onboard Embedded UAV Imagery: Based on the previous work, a new algorithm for vehicle detection based on deep learning is proposed, with low computational cost and comparable performance to other complex and heavy models. The processing method must be fast enough to run on an onboard computing platform, such as Unmanned Aerial Vehicle (UAV). AerialMPTNet: Multi-Pedestrian and -Vehicle Tracking in Aerial Imagery, Using Temporal and Graphical Features: These two papers address the problem of pedestrian and vehicle tracking in aerial sequences. The task is to determine pedestrians’ and vehicles position, speed, and acceleration for a comprehensive traffic monitoring system based on aerial image data. For this last chain of a traffic monitoring system, this paper presents a new multi-object tracking algorithm that tracks single objects in aerial images and provides the position information in the current image from which the speed and orientation of each vehicle or pedestrian can be extracted

    ShuffleDet: Real-Time Vehicle Detection Network in On-board Embedded UAV Imagery

    Get PDF
    On-board real-time vehicle detection is of great significance for UAVs and other embedded mobile platforms. We propose a computationally inexpensive detection network for vehicle detection in UAV imagery which we call ShuffleDet. In order to enhance the speed-wise performance, we construct our method primarily using channel shuffling and grouped convolutions. We apply inception modules and deformable modules to consider the size and geometric shape of the vehicles. ShuffleDet is evaluated on CARPK and PUCPR+ datasets and compared against the state-of-the-art real-time object detection networks. ShuffleDet achieves 3.8 GFLOPs while it provides competitive performance on test sets of both datasets. We show that our algorithm achieves real-time performance by running at the speed of 14 frames per second on NVIDIA Jetson TX2 showing high potential for this method for real-time processing in UAVs

    Multiple vehicle and people tracking in aerial imagery using stack of micro single-object-tracking CNNs

    Get PDF
    Geo-referenced real-time vehicle and person tracking in aerial imagery has a variety of applications such as traffic and large-scale event monitoring, disaster management, and also for input into predictive traffic and crowd models. However, object tracking in aerial imagery is still an unsolved challenging problem due to the tiny size of the objects as well as different scales and the limited temporal resolution of geo-referenced datasets. In this work, we propose a new approach based on Convolutional Neural Networks (CNNs) to track multiple vehicles and people in aerial image sequences. As the large number of objects in aerial images can exponentially increase the processing demands in multiple object tracking scenarios, the proposed approach utilizes the stack of micro CNNs, where each micro CNN is responsible for a single-object tracking task. We call our approach Stack of Micro-Single-Object-Tracking CNNs (SMSOT-CNN). More precisely, using a two-stream CNN, we extract a set of features from two consecutive frames for each object, with the given location of the object in the previous frame. Then, we assign each MSOT-CNN the extracted features of each object to predict the object location in the current frame. We train and validate the proposed approach on the vehicle and person sets of the KIT AIS dataset of object tracking in aerial image sequences. Results indicate the accurate and time-efficient tracking of multiple vehicles and people by the proposed approach
    corecore